Strong Law Of Large Numbers
   HOME

TheInfoList



OR:

In
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
, the law of large numbers (LLN) is a
theorem In mathematics, a theorem is a statement that has been proved, or can be proved. The ''proof'' of a theorem is a logical argument that uses the inference rules of a deductive system to establish that the theorem is a logical consequence of th ...
that describes the result of performing the same experiment a large number of times. According to the law, the
average In ordinary language, an average is a single number taken as representative of a list of numbers, usually the sum of the numbers divided by how many numbers are in the list (the arithmetic mean). For example, the average of the numbers 2, 3, 4, 7, ...
of the results obtained from a large number of trials should be close to the
expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...
and tends to become closer to the expected value as more trials are performed. The LLN is important because it guarantees stable long-term results for the averages of some random events. For example, while a
casino A casino is a facility for certain types of gambling. Casinos are often built near or combined with hotels, resorts, restaurants, retail shopping, cruise ships, and other tourist attractions. Some casinos are also known for hosting live entertai ...
may lose money in a single spin of the
roulette Roulette is a casino game named after the French word meaning ''little wheel'' which was likely developed from the Italian game Biribi''.'' In the game, a player may choose to place a bet on a single number, various groupings of numbers, the ...
wheel, its earnings will tend towards a predictable percentage over a large number of spins. Any winning streak by a player will eventually be overcome by the parameters of the game. Importantly, the law applies (as the name indicates) only when a ''large number'' of observations are considered. There is no principle that a small number of observations will coincide with the expected value or that a streak of one value will immediately be "balanced" by the others (see the gambler's fallacy). It is also important to note that the LLN only applies to the average. Therefore, while : \lim_ \sum_^n \frac n = \overline other formulas that look similar are not verified, such as the raw deviation from "theoretical results": : \sum_^n X_i - n\times\overline not only does it not converge toward zero as ''n'' increases, but it tends to increase in absolute value as ''n'' increases.


Examples

For example, a single roll of a fair, six-sided die produces one of the numbers 1, 2, 3, 4, 5, or 6, each with equal
probability Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
. Therefore, the expected value of the average of the rolls is: : \frac = 3.5 According to the law of large numbers, if a large number of six-sided dice are rolled, the average of their values (sometimes called the
sample mean The sample mean (or "empirical mean") and the sample covariance are statistics computed from a Sample (statistics), sample of data on one or more random variables. The sample mean is the average value (or mean, mean value) of a sample (statistic ...
) will approach 3.5, with the precision increasing as more dice are rolled. It follows from the law of large numbers that the
empirical probability The empirical probability, relative frequency, or experimental probability of an event is the ratio of the number of outcomes in which a specified event occurs to the total number of trials, not in a theoretical sample space but in an actual experi ...
of success in a series of
Bernoulli trial In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is c ...
s will converge to the theoretical probability. For a
Bernoulli random variable In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probabili ...
, the expected value is the theoretical probability of success, and the average of ''n'' such variables (assuming they are independent and identically distributed (i.i.d.)) is precisely the relative frequency. For example, a
fair coin In probability theory and statistics, a sequence of independent Bernoulli trials with probability 1/2 of success on each trial is metaphorically called a fair coin. One for which the probability is not 1/2 is called a biased or unfair coin. In th ...
toss is a Bernoulli trial. When a fair coin is flipped once, the theoretical probability that the outcome will be heads is equal to . Therefore, according to the law of large numbers, the proportion of heads in a "large" number of coin flips "should be" roughly . In particular, the proportion of heads after ''n'' flips will
almost surely In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (or Lebesgue measure 1). In other words, the set of possible exceptions may be non-empty, but it has probability 0. ...
converge Converge may refer to: * Converge (band), American hardcore punk band * Converge (Baptist denomination), American national evangelical Baptist body * Limit (mathematics) * Converge ICT, internet service provider in the Philippines *CONVERGE CFD s ...
to as ''n'' approaches infinity. Although the proportion of heads (and tails) approaches , almost surely the
absolute difference The absolute difference of two real numbers x and y is given by , x-y, , the absolute value of their difference. It describes the distance on the real line between the points corresponding to x and y. It is a special case of the Lp distance for a ...
in the number of heads and tails will become large as the number of flips becomes large. That is, the probability that the absolute difference is a small number approaches zero as the number of flips becomes large. Also, almost surely the ratio of the absolute difference to the number of flips will approach zero. Intuitively, the expected difference grows, but at a slower rate than the number of flips. Another good example of the LLN is the
Monte Carlo method Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be determi ...
. These methods are a broad class of
computation Computation is any type of arithmetic or non-arithmetic calculation that follows a well-defined model (e.g., an algorithm). Mechanical or electronic devices (or, historically, people) that perform computations are known as ''computers''. An es ...
al
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
s that rely on repeated
random sampling In statistics, quality assurance, and survey methodology, sampling is the selection of a subset (a statistical sample) of individuals from within a statistical population to estimate characteristics of the whole population. Statisticians attemp ...
to obtain numerical results. The larger the number of repetitions, the better the approximation tends to be. The reason that this method is important is mainly that, sometimes, it is difficult or impossible to use other approaches.


Limitation

The average of the results obtained from a large number of trials may fail to converge in some cases. For instance, the average of ''n'' results taken from the
Cauchy distribution The Cauchy distribution, named after Augustin Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution (after Hendrik Lorentz), Cauchy–Lorentz distribution, Lorentz(ian) fun ...
or some
Pareto distribution The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto ( ), is a power-law probability distribution that is used in description of social, quality control, scientific, geophysical, actua ...
s (α<1) will not converge as ''n'' becomes larger; the reason is heavy tails. The Cauchy distribution and the Pareto distribution represent two cases: the Cauchy distribution does not have an expectation, whereas the expectation of the Pareto distribution (α<1) is infinite. One way to generate the Cauchy-distributed example is where the random numbers equal the
tangent In geometry, the tangent line (or simply tangent) to a plane curve at a given point is the straight line that "just touches" the curve at that point. Leibniz defined it as the line through a pair of infinitely close points on the curve. More ...
of an angle uniformly distributed between −90° and +90°. The
median In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic fe ...
is zero, but the expected value does not exist, and indeed the average of ''n'' such variables have the same distribution as one such variable. It does not converge in probability toward zero (or any other value) as ''n'' goes to infinity. And if the trials embed a selection bias, typical in human economic/rational behaviour, the law of large numbers does not help in solving the bias. Even if the number of trials is increased the selection bias remains.


History

The Italian mathematician
Gerolamo Cardano Gerolamo Cardano (; also Girolamo or Geronimo; french: link=no, Jérôme Cardan; la, Hieronymus Cardanus; 24 September 1501– 21 September 1576) was an Italian polymath, whose interests and proficiencies ranged through those of mathematician, ...
(1501–1576) stated without proof that the accuracies of empirical statistics tend to improve with the number of trials. This was then formalized as a law of large numbers. A special form of the LLN (for a binary random variable) was first proved by
Jacob Bernoulli Jacob Bernoulli (also known as James or Jacques; – 16 August 1705) was one of the many prominent mathematicians in the Bernoulli family. He was an early proponent of Leibnizian calculus and sided with Gottfried Wilhelm Leibniz during the Le ...
. It took him over 20 years to develop a sufficiently rigorous mathematical proof which was published in his (''The Art of Conjecturing'') in 1713. He named this his "Golden Theorem" but it became generally known as "Bernoulli's theorem". This should not be confused with
Bernoulli's principle In fluid dynamics, Bernoulli's principle states that an increase in the speed of a fluid occurs simultaneously with a decrease in static pressure or a decrease in the fluid's potential energy. The principle is named after the Swiss mathematici ...
, named after Jacob Bernoulli's nephew
Daniel Bernoulli Daniel Bernoulli FRS (; – 27 March 1782) was a Swiss mathematician and physicist and was one of the many prominent mathematicians in the Bernoulli family from Basel. He is particularly remembered for his applications of mathematics to mechan ...
. In 1837, S. D. Poisson further described it under the name ("the law of large numbers"). Thereafter, it was known under both names, but the "law of large numbers" is most frequently used. After Bernoulli and Poisson published their efforts, other mathematicians also contributed to refinement of the law, including
Chebyshev Pafnuty Lvovich Chebyshev ( rus, Пафну́тий Льво́вич Чебышёв, p=pɐfˈnutʲɪj ˈlʲvovʲɪtɕ tɕɪbɨˈʂof) ( – ) was a Russian mathematician and considered to be the founding father of Russian mathematics. Chebyshe ...
,
Markov Markov (Bulgarian, russian: Марков), Markova, and Markoff are common surnames used in Russia and Bulgaria. Notable people with the name include: Academics *Ivana Markova (born 1938), Czechoslovak-British emeritus professor of psychology at t ...
,
Borel Borel may refer to: People * Borel (author), 18th-century French playwright * Jacques Brunius, Borel (1906–1967), pseudonym of the French actor Jacques Henri Cottance * Émile Borel (1871 – 1956), a French mathematician known for his founding ...
, Cantelli,
Kolmogorov Andrey Nikolaevich Kolmogorov ( rus, Андре́й Никола́евич Колмого́ров, p=ɐnˈdrʲej nʲɪkɐˈlajɪvʲɪtɕ kəlmɐˈɡorəf, a=Ru-Andrey Nikolaevich Kolmogorov.ogg, 25 April 1903 – 20 October 1987) was a Sovi ...
and Khinchin. Markov showed that the law can apply to a random variable that does not have a finite variance under some other weaker assumption, and Khinchin showed in 1929 that if the series consists of independent identically distributed random variables, it suffices that the
expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...
exists for the weak law of large numbers to be true. These further studies have given rise to two prominent forms of the LLN. One is called the "weak" law and the other the "strong" law, in reference to two different modes of
convergence Convergence may refer to: Arts and media Literature *''Convergence'' (book series), edited by Ruth Nanda Anshen *Convergence (comics), "Convergence" (comics), two separate story lines published by DC Comics: **A four-part crossover storyline that ...
of the cumulative sample means to the expected value; in particular, as explained below, the strong form implies the weak.


Forms

There are two different versions of the law of large numbers that are described below. They are called the'' strong law of large numbers'' and the ''weak law of large numbers''. Stated for the case where ''X''1, ''X''2, ... is an infinite sequence of independent and identically distributed (i.i.d.)
Lebesgue integrable In mathematics, the integral of a non-negative function of a single variable can be regarded, in the simplest case, as the area between the graph of that function and the -axis. The Lebesgue integral, named after French mathematician Henri Leb ...
random variables with expected value E(''X''1) = E(''X''2) = ...= ''µ'', both versions of the law state that the sample average : \overline_n=\frac1n(X_1+\cdots+X_n) converges to the expected value: (Lebesgue integrability of ''Xj'' means that the expected value E(''Xj'') exists according to Lebesgue integration and is finite. It does ''not'' mean that the associated probability measure is
absolutely continuous In calculus, absolute continuity is a smoothness property of functions that is stronger than continuity and uniform continuity. The notion of absolute continuity allows one to obtain generalizations of the relationship between the two central oper ...
with respect to
Lebesgue measure In measure theory, a branch of mathematics, the Lebesgue measure, named after French mathematician Henri Lebesgue, is the standard way of assigning a measure to subsets of ''n''-dimensional Euclidean space. For ''n'' = 1, 2, or 3, it coincides wit ...
.) Introductory probability texts often additionally assume identical finite
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
\operatorname (X_i)=\sigma^2 (for all i) and no correlation between random variables. In that case, the variance of the average of n random variables is : \operatorname(\overline_n) = \operatorname(\tfrac1n(X_1+\cdots+X_n)) = \frac \operatorname(X_1+\cdots+X_n) = \frac = \frac. which can be used to shorten and simplify the proofs. This assumption of finite
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
is ''not necessary''. Large or infinite variance will make the convergence slower, but the LLN holds anyway.
Mutual independence Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two events are independent, statistically independent, or stochastically independent if, informally speaking, the occurrence of ...
of the random variables can be replaced by
pairwise independence In probability theory, a pairwise independent collection of random variables is a set of random variables any two of which are independent. Any collection of mutually independent random variables is pairwise independent, but some pairwise independe ...
or
exchangeability In statistics, an exchangeable sequence of random variables (also sometimes interchangeable) is a sequence ''X''1, ''X''2, ''X''3, ... (which may be finitely or infinitely long) whose joint probability distribution does not change whe ...
in both versions of the law. The difference between the strong and the weak version is concerned with the mode of convergence being asserted. For interpretation of these modes, see
Convergence of random variables In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications to ...
.


Weak law

The weak law of large numbers (also called Khinchin's law) states that the sample average
converges in probability In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications to ...
towards the expected value That is, for any positive number ''ε'', : \lim_\Pr\!\left(\,, \overline_n-\mu, < \varepsilon\,\right) = 1. Interpreting this result, the weak law states that for any nonzero margin specified (''ε''), no matter how small, with a sufficiently large sample there will be a very high probability that the average of the observations will be close to the expected value; that is, within the margin. As mentioned earlier, the weak law applies in the case of i.i.d. random variables, but it also applies in some other cases. For example, the variance may be different for each random variable in the series, keeping the expected value constant. If the variances are bounded, then the law applies, as shown by
Chebyshev Pafnuty Lvovich Chebyshev ( rus, Пафну́тий Льво́вич Чебышёв, p=pɐfˈnutʲɪj ˈlʲvovʲɪtɕ tɕɪbɨˈʂof) ( – ) was a Russian mathematician and considered to be the founding father of Russian mathematics. Chebyshe ...
as early as 1867. (If the expected values change during the series, then we can simply apply the law to the average deviation from the respective expected values. The law then states that this converges in probability to zero.) In fact, Chebyshev's proof works so long as the variance of the average of the first ''n'' values goes to zero as ''n'' goes to infinity. As an example, assume that each random variable in the series follows a Gaussian distribution with mean zero, but with variance equal to 2n/\log(n+1), which is not bounded. At each stage, the average will be normally distributed (as the average of a set of normally distributed variables). The variance of the sum is equal to the sum of the variances, which is
asymptotic In analytic geometry, an asymptote () of a curve is a line such that the distance between the curve and the line approaches zero as one or both of the ''x'' or ''y'' coordinates tends to infinity. In projective geometry and related contexts, ...
to n^2/\log n. The variance of the average is therefore asymptotic to 1/\log n and goes to zero. There are also examples of the weak law applying even though the expected value does not exist.


Strong law

The strong law of large numbers (also called
Kolmogorov Andrey Nikolaevich Kolmogorov ( rus, Андре́й Никола́евич Колмого́ров, p=ɐnˈdrʲej nʲɪkɐˈlajɪvʲɪtɕ kəlmɐˈɡorəf, a=Ru-Andrey Nikolaevich Kolmogorov.ogg, 25 April 1903 – 20 October 1987) was a Sovi ...
's law) states that the sample average converges almost surely to the expected value That is, : \Pr\!\left( \lim_\overline_n = \mu \right) = 1. What this means is that the probability that, as the number of trials ''n'' goes to infinity, the average of the observations converges to the expected value, is equal to one. The modern proof of the strong law is more complex than that of the weak law, and relies on passing to an appropriate subsequence. The strong law of large numbers can itself be seen as a special case of the pointwise ergodic theorem. This view justifies the intuitive interpretation of the expected value (for Lebesgue integration only) of a random variable when sampled repeatedly as the "long-term average". Law 3 is called the strong law because random variables which converge strongly (almost surely) are guaranteed to converge weakly (in probability). However the weak law is known to hold in certain conditions where the strong law does not hold and then the convergence is only weak (in probability). See differences between the weak law and the strong law. The strong law applies to independent identically distributed random variables having an expected value (like the weak law). This was proved by Kolmogorov in 1930. It can also apply in other cases. Kolmogorov also showed, in 1933, that if the variables are independent and identically distributed, then for the average to converge almost surely on ''something'' (this can be considered another statement of the strong law), it is necessary that they have an expected value (and then of course the average will converge almost surely on that). If the summands are independent but not identically distributed, then provided that each ''X''''k'' has a finite second moment and : \sum_^ \frac \operatorname _k< \infty. This statement is known as ''Kolmogorov's strong law'', see e.g. .


Differences between the weak law and the strong law

The ''weak law'' states that for a specified large ''n'', the average \overline_n is likely to be near ''μ''. Thus, it leaves open the possibility that , \overline_n -\mu, > \varepsilon happens an infinite number of times, although at infrequent intervals. (Not necessarily , \overline_n -\mu, \neq 0 for all ''n''). The ''strong law'' shows that this
almost surely In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (or Lebesgue measure 1). In other words, the set of possible exceptions may be non-empty, but it has probability 0. ...
will not occur. Note that it does not imply that with probability 1, we have that for any the inequality , \overline_n -\mu, < \varepsilon holds for all large enough ''n'', since the convergence is not necessarily uniform on the set where it holds. The strong law does not hold in the following cases, but the weak law does.


Uniform law of large numbers

Suppose ''f''(''x'',''θ'') is some
function Function or functionality may refer to: Computing * Function key, a type of key on computer keyboards * Function model, a structured representation of processes in a system * Function object or functor or functionoid, a concept of object-oriente ...
defined for ''θ'' ∈ Θ, and continuous in ''θ''. Then for any fixed ''θ'', the sequence will be a sequence of independent and identically distributed random variables, such that the sample mean of this sequence converges in probability to E 'f''(''X'',''θ'') This is the ''pointwise'' (in ''θ'') convergence. The uniform law of large numbers states the conditions under which the convergence happens ''uniformly'' in ''θ''. If # ''Θ'' is compact, # ''f''(''x'',''θ'') is continuous at each ''θ'' ∈ Θ for
almost all In mathematics, the term "almost all" means "all but a negligible amount". More precisely, if X is a set, "almost all elements of X" means "all elements of X but those in a negligible subset of X". The meaning of "negligible" depends on the math ...
''x''s, and measurable function of ''x'' at each ''θ''. # there exists a dominating function ''d''(''x'') such that E 'd''(''X'')< ∞, and ::: \left\, f(x,\theta) \right\, \leq d(x) \quad\text\ \theta\in\Theta. Then E 'f''(''X'',''θ'')is continuous in ''θ'', and : \sup_ \left\, \frac1n\sum_^n f(X_i,\theta) - \operatorname (X,\theta)\right\, \xrightarrow \ 0. This result is useful to derive consistency of a large class of estimators (see
Extremum estimator In statistics and econometrics, extremum estimators are a wide class of estimators for parametric models that are calculated through maximization (or minimization) of a certain objective function, which depends on the data. The general theory of ext ...
).


Borel's law of large numbers

Borel's law of large numbers, named after
Émile Borel Félix Édouard Justin Émile Borel (; 7 January 1871 – 3 February 1956) was a French mathematician A mathematician is someone who uses an extensive knowledge of mathematics in their work, typically to solve mathematical problems. Math ...
, states that if an experiment is repeated a large number of times, independently under identical conditions, then the proportion of times that any specified event occurs approximately equals the probability of the event's occurrence on any particular trial; the larger the number of repetitions, the better the approximation tends to be. More precisely, if ''E'' denotes the event in question, ''p'' its probability of occurrence, and ''Nn''(''E'') the number of times ''E'' occurs in the first ''n'' trials, then with probability one,An Analytic Technique to Prove Borel's Strong Law of Large Numbers Wen, L. Am Math Month 1991
/ref> : \frac\to p\textn\to\infty. This theorem makes rigorous the intuitive notion of probability as the long-run relative frequency of an event's occurrence. It is a special case of any of several more general laws of large numbers in probability theory.
Chebyshev's inequality In probability theory, Chebyshev's inequality (also called the Bienaymé–Chebyshev inequality) guarantees that, for a wide class of probability distributions, no more than a certain fraction of values can be more than a certain distance from th ...
. Let ''X'' be a
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
with finite
expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...
''μ'' and finite non-zero
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
''σ''2. Then for any
real number In mathematics, a real number is a number that can be used to measure a ''continuous'' one-dimensional quantity such as a distance, duration or temperature. Here, ''continuous'' means that values can have arbitrarily small variations. Every real ...
, : \Pr(, X-\mu, \geq k\sigma) \leq \frac.


Proof of the weak law

Given ''X''1, ''X''2, ... an infinite sequence of
i.i.d. In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is us ...
random variables with finite expected value E(X_1)=E(X_2)=\cdots=\mu<\infty, we are interested in the convergence of the sample average \overline_n=\tfrac1n(X_1+\cdots+X_n). The weak law of large numbers states:


Proof using Chebyshev's inequality assuming finite variance

This proof uses the assumption of finite
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
\operatorname (X_i)=\sigma^2 (for all i). The independence of the random variables implies no correlation between them, and we have that \operatorname(\overline_n) = \operatorname(\tfrac1n(X_1+\cdots+X_n)) = \frac \operatorname(X_1+\cdots+X_n) = \frac = \frac. The common mean μ of the sequence is the mean of the sample average: E(\overline_n) = \mu. Using
Chebyshev's inequality In probability theory, Chebyshev's inequality (also called the Bienaymé–Chebyshev inequality) guarantees that, for a wide class of probability distributions, no more than a certain fraction of values can be more than a certain distance from th ...
on \overline_n results in \operatorname( \left, \overline_n-\mu \ \geq \varepsilon) \leq \frac. This may be used to obtain the following: \operatorname( \left, \overline_n-\mu \ < \varepsilon) = 1 - \operatorname( \left, \overline_n-\mu \ \geq \varepsilon) \geq 1 - \frac. As ''n'' approaches infinity, the expression approaches 1. And by definition of
convergence in probability In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications to ...
, we have obtained


Proof using convergence of characteristic functions

By
Taylor's theorem In calculus, Taylor's theorem gives an approximation of a ''k''-times differentiable function around a given point by a polynomial of degree ''k'', called the ''k''th-order Taylor polynomial. For a smooth function, the Taylor polynomial is the t ...
for
complex function Complex analysis, traditionally known as the theory of functions of a complex variable, is the branch of mathematical analysis that investigates functions of complex numbers. It is helpful in many branches of mathematics, including algebraic ...
s, the
characteristic function In mathematics, the term "characteristic function" can refer to any of several distinct concepts: * The indicator function of a subset, that is the function ::\mathbf_A\colon X \to \, :which for a given subset ''A'' of ''X'', has value 1 at points ...
of any random variable, ''X'', with finite mean μ, can be written as \varphi_X(t) = 1 + it\mu + o(t), \quad t \rightarrow 0. All ''X''1, ''X''2, ... have the same characteristic function, so we will simply denote this ''φ''''X''. Among the basic properties of characteristic functions there are \varphi_(t)= \varphi_X(\tfrac t n) \quad \text \quad \varphi_(t)=\varphi_X(t) \varphi_Y(t) \quad if ''X'' and ''Y'' are independent. These rules can be used to calculate the characteristic function of \scriptstyle\overline_n in terms of ''φ''''X'': \varphi_(t)= \left varphi_X\left(\right)\rightn = \left + i\mu + o\left(\right)\rightn \, \rightarrow \, e^, \quad \text \quad n \rightarrow \infty. The limit  ''e''''it''μ  is the characteristic function of the constant random variable μ, and hence by the Lévy continuity theorem, \scriptstyle\overline_n
converges in distribution In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications to ...
to μ: \overline_n \, \xrightarrow \, \mu \qquad\text\qquad n \to \infty. μ is a constant, which implies that convergence in distribution to μ and convergence in probability to μ are equivalent (see
Convergence of random variables In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications to ...
.) Therefore, This shows that the sample mean converges in probability to the derivative of the characteristic function at the origin, as long as the latter exists.


Consequences

The law of large numbers provides an expectation of an unknown distribution from a realization of the sequence, but also any feature of the probability distribution. By applying Borel's law of large numbers, one could easily obtain the probability mass function. For each event in the objective probability mass function, one could approximate the probability of the event's occurrence with the proportion of times that any specified event occurs. The larger the number of repetitions, the better the approximation. As for the continuous case: C=(a-h,a+h], for small positive h. Thus, for large n: \frac\thickapprox p=P(X\in C)=\int_^ f(x)dx \thickapprox 2hf(a) With this method, one can cover the whole x-axis with a grid (with grid size 2h) and obtain a bar graph which is called a
histogram A histogram is an approximate representation of the distribution of numerical data. The term was first introduced by Karl Pearson. To construct a histogram, the first step is to " bin" (or "bucket") the range of values—that is, divide the ent ...
.


See also

*
Asymptotic equipartition property In information theory, the asymptotic equipartition property (AEP) is a general property of the output samples of a stochastic source. It is fundamental to the concept of typical set used in theories of data compression. Roughly speaking, the th ...
*
Central limit theorem In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themselv ...
*
Infinite monkey theorem The infinite monkey theorem states that a monkey hitting keys at random on a typewriter keyboard for an infinite amount of time will almost surely type any given text, such as the complete works of William Shakespeare. In fact, the monkey would ...
*
Law of averages The law of averages is the commonly held belief that a particular outcome or event will, over certain periods of time, occur at a frequency that is similar to its probability. Depending on context or application it can be considered a valid common ...
*
Law of the iterated logarithm In probability theory, the law of the iterated logarithm describes the magnitude of the fluctuations of a random walk. The original statement of the law of the iterated logarithm is due to A. Ya. Khinchin (1924). Another statement was given by A ...
*
Law of truly large numbers The law of truly large numbers (a statistical adage), attributed to Persi Diaconis and Frederick Mosteller, states that with a large enough number of independent samples, any highly implausible (i.e. unlikely in any single sample, but with constan ...
*
Lindy effect The Lindy effect (also known as Lindy's Law) is a theorized phenomenon by which the future life expectancy of some non-perishable things, like a technology or an idea, is proportional to their current age. Thus, the Lindy effect proposes the longe ...
*
Regression toward the mean In statistics, regression toward the mean (also called reversion to the mean, and reversion to mediocrity) is the fact that if one sample of a random variable is extreme, the next sampling of the same random variable is likely to be closer to it ...
*
Sortition In governance, sortition (also known as selection by lottery, selection by lot, allotment, demarchy, stochocracy, aleatoric democracy, democratic lottery, and lottocracy) is the selection of political officials as a random sample from a larger ...
*
Strong law of small numbers In mathematics, the "strong law of small numbers" is the humorous law that proclaims, in the words of Richard K. Guy (1988): In other words, any given small number appears in far more contexts than may seem reasonable, leading to many apparent ...


Notes


References

* * * * * * * *


External links

* * *
Animations for the Law of Large Numbers
by Yihui Xie using the R packag
animation

Apple CEO Tim Cook said something that would make statisticians cringe
"We don't believe in such laws as laws of large numbers. This is sort of, uh, old dogma, I think, that was cooked up by somebody . said Tim Cook and while: "However, the law of large numbers has nothing to do with large companies, large revenues, or large growth rates. The law of large numbers is a fundamental concept in probability theory and statistics, tying together theoretical probabilities that we can calculate to the actual outcomes of experiments that we empirically perform.'' explained
Business Insider ''Insider'', previously named ''Business Insider'' (''BI''), is an American financial and business news website founded in 2007. Since 2015, a majority stake in ''Business Insider''s parent company Insider Inc. has been owned by the German publ ...
'' {{Authority control Probability theorems Mathematical proofs Asymptotic theory (statistics) Theorems in statistics